﻿ ALPE as LT4eL processing chain environment scu Dan Cristea Corina Foră Faculty of Computer Science, Al I Cuza Ionut Cristian Pistol Faculty of Computer Science, Al I University of Iaşi, Romania Cuza University of Iaş Faculty of Computer Science, Al I Cuza i, Romania; i, Romania Research Institute for Artificial Institute for Computer Science, University of Iaş Romanian Academy, Iaşi, Romania Intelligence, Romanian Academy, ipistol@info uaic ro dcristea@info uaic ro Bucharest corinfor@info uaic ro Abstract among the available modules The two most prominent 34 This paper briefly describes the concept, initial systems of this type are GATE and IBM's UIMA implementation and usage of the ALPE1 system for natural GATE is a versatile environment for building and language processing A hierarchy connecting annotation deploying NLP software and resources, allowing for the schemas, processing tools and resources is used as working integration of a large amount of built-ins in new processing environment for the system, which can perform various pipelines that receive as input single documents or corpora complex NL processing tasks ALPE will be used to build The user can configure new architectures by selecting from linguistic processing chains involving the annotation formats a repository pool the desired modules, as parts of a chain and tools developed in the LT4eL2 project The particularities and advantages of such an endeavor are the main topics of this The configured chain of processes may be put to work on paper an input file and the result is an output file, XML annotated Keywords UIMA is a new promising release of IBM Research XML annotation, processing architectures, e-learning, linguistic (first freely available version – June 2007) It offers the processing systems, multilinguality same general functionalities as GATE, but once a ocessing module is integrated in UIMA it can be used in 1 Introduction prany further chains without any modifications (GATE One of the latest developments in Computational requires wrappers to be written to allow two new modules Linguistics, and one which promises to have a significant to be connected in a chain) Also, UIMA allows the user to impact for future linguistic processing systems, is the work with various annotation formats and perform various emerging of linguistic annotation meta-systems, that make additional operations on annotated corpora Since the use of existing processing tools and implement some sort of release of UIMA, the GATE developers have made processing path, pipelined or otherwise available a module that allows GATE and UIMA The “wild” diversity of formats, tools and resources processing modules to be interchangeable, basically scares off a newcomer or less informed user who needs to merging the “pool” of modules available configure an NLP (Natural Language Processing) ALPE is another approach to the task of developing an architecture to solve a given task Configuration and LP meta-system, offering more flexibility than existing parametrisation of processing chains is also very time systems ALPE is based on the hierarchy of annotation consuming due to the heavy documentation the component schemas described in In this model, XML annotation modules come with The more sophisticated the task, the schemas are nodes in a directed acyclic graph, and the more likely it is that it requires complex pre-processing hierarchical links are subsumption relations between steps involving several other NLP systems, which have to schemas In it is described the way the graph may be be chosen, documented and interfaced augmented with processing power by marking edges The newly emerged linguistic processing metasystems linking parent nodes to daughter nodes with processors make use of existing modules in building LP chains, use names, each realizing an elementary NL processing step existing linguistic resources and allow the user to add/build On the augmented graph, three operations are defined: new ones, and also allow the user to compare and choose simplification, pipeline and merge A navigation algorithm is described in this hierarchy, which computes paths 1between a start node, corresponding to an input file, and a Automated Linguistic Processing Environment 2 Language Technology for e-Learning, IST 027391, 3 http://www lt4el eu/ http://gate ac uk/ 4 http://www research ibm com/UIMA/ destination node corresponding to an output file To these 2 2 The hierarchy augmented with computed paths relate sequences of operations, which are equivalent to architectures of serial and parallel processing power combinations of processors When an input file is given to a In NLP, the needs for reusability of modules, and language system that implements these principles, and the and application independence impose the reuse of specific requirements of an output annotation are specified as the modules in configurable architectures In order for the destination node, first the XML annotation schema of the modules to be interconnectable, the module’s inputs and input file is determined, then this schema is classified onto outputs must observe the constraints expressed as the hierarchy, becoming the start node, then the expression annotation schemas of operations corresponding to the minimum paths linking When we place processes on the edges of the graph of the start node to the destination node is computed (the linguistic metadata, the hierarchy of annotation schemas architecture) Finally the input file can be given to this becomes a graph of interconnecting modules More architecture, resulting in the expected output file precisely, if a node A is placed above a node B in the Section two of this paper presents the theory behind hierarchy, there should be a process which takes as input a the ALPE system, and briefly describes the current state of file observing the restrictions imposed by the schema A and development Section three describes how the ALPE produces as output a file observing the restrictions imposed system will be used in the framework of the LT4eL by the schema B European project The conclusions, as well as the further We will call a graph (or hierarchy) of annotation planned developments are described in section four schemas on which processing modules have been marked on edges as being augmented with processing power (or 2 ALPE simply, augmented) The null process, marked Ø, is a 2 1 Linguistic metadata organised in a module that leaves an input file unmodified hierarchy 2 3 Building the hierarchy The basis of our model is a directed acyclic graph (DAG) Three hierarchy building operations are used in our model: which configures the metadata of linguistic annotation in a initialize-graph, classify-file and integrate-process They hierarchy of XML schemas Nodes of the graph are distinct are described below XML annotation schemas, while edges are hierarchical The initialize-hierarchy operation receives no input relations between schemas Users’ interactions with the and outputs a trivial hierarchy formed by a ROOT node graph can modify it from an initial trivial shape, which (representing the empty annotation schema) includes just one empty annotation schema, up to a huge Once the graph is initialised, its nodes and edges are graph accommodating a diversity of annotation needs If contributed by classifying documents in the hierarchy there is an oriented edge linking a node A with a node B in The classify-file operation takes an existing hierarchy the hierarchy (we will say also that B is a descendent of A) and a document marked with metadata observing a certain then the following conditions hold simultaneously: schema and classifies the schema of the document within • any tag-name of A is also in B; the hierarchy The operation results in an updated hierarchy and the location of the input schema as a node of the • any attribute in the list of attributes of a tag-name in A is hierarchy If the input document fully complies with a also in the list of attributes of the same tag-name of B schema described by a node of the hierarchy, the latter As such, a hierarchical relation between a node A and one remains unchanged and the output indicates this existing descendent B describes B as an annotation schema which is node; otherwise a new node, corresponding to the more informative than A In general, either B has at least annotation schema of the input document, is inserted in the one tag-name which is not in A, and/or there is at least one proper place within the hierarchy tag-name in B such that at least one attribute in its list of Integrate-process is an operation aiming to properly attributes is not in the list of attributes of the homonymous attach processes to the edges of a hierarchy of annotation tag-name in A We will agree to use the term path in this schemas, mainly by labeling edges with processors, but DAG with its meaning from the support graph, i e a path sometimes also by adding nodes and edges and labeling the between the nodes A and B in the graph is the sequence of connecting edges adjacent edges, irrespective of their orientation, which links An ALPE type hierarchy can either be defined nodes A and B As we will see later, the way this graph is completely manually, or partially manually and partially being built triggers its property of being fully connected completed with operations described above, or only using This means that, if edges are seen undirected, there is the above operations The ALPE hierarchy variant always at least one path linking any two nodes developed for the LT4eL project is fully constructed manually, as its purpose is testing the functionalities offered, and less the building procedure 2 4 Operations on the augmented graph specifications of the start node (schema) onto a file Three main operations can be supported by the model, as observing the specifications of the destination node follows (schema) If an edge linking a node A to a node B (therefore B Once the entry and exit points in the hierarchy have being a descendant of A) is marked with a process p, we say been determined and translation links have been devised, that A pipelines to B by p Equally, when a file all the rest is done by the hierarchy itself augmented with corresponding to the schema A is pipelined to B by p, it will the processing power in the manner described above This be transformed by the process p onto a file that corresponds way, the processing needed to arrive from the input to the to the restrictions imposed by the schema B This arises in output is computed by the hierarchy as sequences of serial augmenting the annotation of the input file (observing the and parallel processing steps, each of them supported in the restrictions of the schema A) with new information, as hierarchy by means of specialised modules Then the described by schema B process itself is launched on the input file It includes an For any two nodes A and B of the graph, such that B is initial translation phase, followed by a sequence of a descendant of A, we will say that B can be simplified to simplifications, pipelines and/or merges, as described by A When a file corresponding to the schema B is simplified the computed path, and followed by a final translation, to A, it will lose all annotations except those imposed by which is expected to produce the output file the schema A Practically, a simplification is the opposite of 2 5 Features a (series of) pipeline(s) operation(s) In this section we will describe a set of features, important The merge operation can be defined in nodes pointed for environments working with linguistic resources and by more than one edge on the hierarchical graph It is not tools, that emerge from the proposed model These features unusual that the edges pointing to the same node are are important especially when considering the proposed labelled by empty processors The merge operation applied integration of resources and tools belonging to the LT4eL to files corresponding to parent nodes combines the project in an ALPE type hierarchy different annotations contributed by these nodes onto one Multilinguality single file corresponding to the schema of the emerging node Usually the adaptation of a module to process a certain With these operations, the graph augmented with natural language is given by the specific set of resources it processing power is useful in two ways: for goal-driven, accesses For instance, a POS-tagger runs the same dynamic, configuration of processing architectures and for algorithms on different sets of language models in order to transforming metadata attached to documents Automatic tag documents for part-of-speeches in different languages configuration of a processing architecture is a result of a To take another example, a shallow parser applies a set of navigation process within the augmented graph between a regular expressions, which are language dependent, in order start node and a destination node, the resulted processes to identify chunks In both cases the processing modules being combinations of branching pipelines (serial are language independent and only the specific language simplifications, processing and merges) The difference model or the specific set of rules make them applicable to with respect to GATE and UIMA, both allowing only the language L1 or L2 pipeline processing in which the whole output of the To realise multilinguality within the proposed model preceding processor is given as input to the next processor, means to map the edges of the augmented graph on a is that in our model the required processing may result in a collection of repositories of configuring resources combination of branching pipelines This is due to the (language models, sets of grammar rules, etc ) which are introduction of the merge operation which is able to specific to different languages This can be achieved if the combine two different annotations on the same file Once edges of the graph labeled with processes are indexed with the process is computed, then it can be applied on an input indices corresponding to languages This way, to each file displaying a certain metadata in order to produce an particular language an instance of the graph can be output file with the metadata changed as intended These generated, in which all edges keep one and the same index two files comply with the restrictions encoded by the start – the one corresponding to that particular language This node and, respectively, the destination node of the means that all processors of that particular language should hierarchy access the configuring resources, specific to that language, Since the graph is fully connected, there should always in order for the hierarchy to work properly For instance, in be at least one path connecting any two nodes The paths the graph instance of language Lx, the edge corresponding found are made up of oriented edges and, depending on to a POS-tagger has as index Lx, meaning that it accesses a whether the orientation of the edges is the same as that of configuring resource file that is specific to language Lx – the path or not, we will have pipeline operations or the Lx language model (Figure 1) simplification operations The flow of paths between the It is a fact that different languages have different sets start and the destination node configures the processing of processing tools developed, English being perhaps the combination that transforms any file observing the richer, presently Ideally, the lack of a tool in a specific language should be put on to the lack of the corresponding document to nodes of the hierarchy When a computed configuring resource, once a language independent architecture is run over an input file (corresponding to a processing module is available for that task It is also the start node), the output file (corresponding to the destination case that differences exist in processing chains among node) will be indexed identically with the input file languages For instance one language could have a combined POS-tagger and lemmatizer while another one Manual versus automatic annotation While automatic annotation is supported by the graph, how input observing Acan manual annotation be accommodated by the approach? schema A Usually, in order to train processing modules in NLP, developers use manually annotated corpora To create such t p descendanprocesscorpora, they make use of annotation tools configured to resources of the relationpLxhelp placing XML elements over a text, and to decorate language Lx them with attributes and values As such, if annotation tools do, although in a different way, the same jobs which can be ing output observschema B Bperformed by processing modules, it is most convenient to associate them with edges in the graph in the same way in Processes along edges are language indexed Figure 1which processing modules are associated with these edges realizes these operations independently, pipelining a POS-Meanwhile, it is clear that manual annotation cannot be tagger with a lemmatization module These differences are chained in complex processing architectures in the same reflected in particular instances of sections of the graph, way in which automatic annotation can In order to which, although reproduce the same set of nodes, do not differentiate between automatic and manual processes, as allow but for certain edges linking them The missing edges encumbered by pairs of schemas observing the descendent inhibit pipelining operations along them, but are suited for relation, it results that edges should have facets, for simplification operations instance AUT and MAN Under the AUT facet of a POS- Distributivity and access tagging edge, for instance, the automatic POS-tagger Edges, as recorders of processors, can be seen as Web should be placed, while under the MAN facet – the POS- services, therefore can be physically supported by servers tagging annotation tool should be placed anywhere on the virtual space Similarly, documents (the The configuration files of these tools can usually be files attached to nodes) could be physically located in separated from the tools themselves We can say that the different locations than the hierarchy itself This way, the corresponding configuration files particularize the whole augmented graph of annotation schemas could be annotation tools, which label edges of the graph, in the distributed over the Web However, as the unique accessing same way in which language specific resources gate, a portal holding a representation of the entire graph, particularize processing modules on which classification and navigation operations can be IPR and cost issues performed, must exist By manual configuration and/or Intellectual property rights can be attached to documents repeated classification accesses, the graph grows Also and modules as access rights Only a user whose profile navigation accesses are initiated by users and run on the corresponds to the IPR profile of a resource/tool can have portal They leave the graph constant while returning the access to it As a result, while computation of processing computed architectures, to be executed mainly remotely chains within the hierarchy is open to anybody, the actual from the portal, by activating chains of processors which access to the dynamically computed architectures could be are not all located on the same machine, but which are banned to users not corresponding to certain IPR profiles of pointed to by edges of the graph certain component modules or resources they need Versioning of language resources More than that, some price policies can be easily Each document can have multiple annotations, in implemented within the model For instance, one can correspondence with the nodes of the hierarchy While imagine that the computation of a path results also in a some of the language resources may have been created by computation of a cost, depending on particular fees the human annotators (therefore being taken as gold standards), chained Web servers charge for their services, on the load others can be automatically created, some even using the of some service providers, etc augmented hierarchy More than that, different versions of Out of this, it is also imaginable the graph as including the same hub document may correspond to just one node in more than one edge between the same two nodes in the the hierarchy, mainly by being created both ways (manually hierarchy This can happen when different modules and automatically) The versioning problem could be performing the same task are reported by different accommodated by the model through an indexing contributors When these modules charge fees for their mechanism similar to the language indexing of edges, by services, it is foreseeable also an optimization calculus over allowing the attachment of different versions of the same the set of paths that can be computed for a transformation 2 6 ALPE vs GATE and UIMA with respect to the overall price Since ALPE, GATE and UIMA are systems capable of Facing the diversity of annotation styles performing similar tasks, the significant differences and, most It is a fact that nowadays a huge diversity of annotation important, the advantages of ALPE over the other two are variants circulates and is being used in diverse research presented below communities It is far from us to belief that a Procustes' Bed First of all, ALPE is intended primarily to facilitate policy could ever be imposed in the CL or NLP user interaction with the system, allowing the common user community, that would aim for a strict adoption of to access integrated resources and tools As a standalone standards for the annotated resources On the other hand, it linguistic processing environment, the user is presented is also true that efforts towards standardization are with a visual representation of a hierarchy of annotation continually being made (see the TEI, XCES, ISLE, etc formats and has basically three main choices: he can either initiatives ) Moreover, Semantic Web, with its add a new resource to the hierarchy, a new processing tool tremendous need for interconnection and integration of or create and use a processing chain by specifying start and resources and applications on communicating end nodes in the hierarchy and providing the input environments, boosts vividly the appeal for standardization document In comparison, GATE offers a user interface just It is therefore foreseeable that more and more designers for creating and using processing chains, and these have to will adopt recognized standards, in order to allow easy be built manually, requiring at least a well informed user interoperability of their applications A realistic view on the UIMA is even more oriented to the CL specialist, offering matter would bring into the focus the standards while also very little in terms of visual user interaction providing means for users to interact with the system even Every one of the three main functionalities is easier to if they do not rigorously comply with the standards perform using ALPE Both UIMA and GATE require some We have seen already that, by classification, any formal description to be written for each new resource schema could be placed in the hierarchy Of course, integrated into the system, but ALPE generates these classification could increase in an uncontrollable way the formal descriptions automatically When adding a new number of nodes of the hierarchy The proliferation could processing tool, ALPE has much more permissive be caused not so much by the semantic diversity of the restrictions with regards to what tool can be integrated: it annotations, as by the differences in name spaces (names of basically has to be either a webservice or a command line tags and attributes) Suppose one wants to connect a new executable under Windows or Linux GATE allows the user file to the hierarchy in order to exploit its processing power to integrate just Java and Perl based tools, and this is done What s/he has to do is to first classify the metadata scheme by writing some dedicated code UIMA allows only Java of the file If the system reports the result as being a new based tools to be integrated, and only after significant node in the hierarchy, then its position gives also implementations and changes to the original code indications of its similarity/dissimilarity with the When creating and using processing chains, the most neighboring schemas A visual inspection of the names significant advantage of ALPE is the automatically creation used can reveal, for instance, that a simple translation of processing chains, and the fact they can be created operation can make the new node identical to an existing between any two formats in the existing hierarchy (if the one This means that the new schema is not new for the required modules are available) GATE and UIMA offer hierarchy, although the set of conventions used, which relatively simple ways to create and use processing chains, make it different from those of the hierarchy, are imposed but the user has to be sure the required modules exist and by the restrictions of the user’s application have compatible input/output formats Also, ALPE deals Technically, this can be achieved by temporarily much easier with multilinguality, as it has a module that creating links between the new schema classified by the performs language identification automatically for each hierarchy, as a new node, and its corresponding schema in input file, then selects the corresponding tools and language the hierarchy Processing along such a link is different than resources, if available GATE and UIMA are mainly the usual behavior associated to the edges of the graph and focused on English, GATE having some modules for is specific to wrappers It describes a translation process, in Romanian, but the user has to make sure to select those and which the annotation is not enriched, but rather names of not the English ones when building a processing chain for a tags and attributes are changed Ideally, the processing Romanian document abilities of the hierarchy should include also the capability Let us consider a simple use-case: the user has two to automatically discover the wrapping procedure This task processing tools he wants to use on the same input file and is not trivial since it would require that the hierarchy merge the results in an output file Using ALPE he just has “understands” the intentions hidden behind the annotation, to use the available functionality to integrate the two tools displaying an intelligent behavior which is not easy to in any hierarchy (even if all the annotation formats implement, but could make an interesting topic for further involved are not currently available: new nodes will be research created automatically), then input the file and specify the required output format (node) Using GATE, the user has to implement the integration of the tools to make them repository form for the projects LOs Practically by clicking available to the processing chain building interface, to build on the node the users have access to all files observing the and run two processing chains, one for each tool, then specific annotation A dedicated portal was built that merge the results manually (GATE does not allow parallel includes functionalities for upload and download the processing and merging of annotations) UIMA performs projects’ LOs this task basically the same as GATE, requiring even more implementation when integrating the new tools, but can The formats (ALPE schema definitions) shown in perform annotations merging figure 2 are: - doc: Format of MS Word and OpenOffice text 3 ALPE and LT4eL documents; The model presented in the previous sections is partially - pdf: Portable Document Format generated by any of the implemented and will be used (in an intermediate version) available software; in the framework of the LT4eL European FP6 project - latex: LaTeX document format; As the main objective of the project is to provide functionalities based on language technologies and to - html: HTML format for web pages and documentations; integrate semantic knowledge in Learning Management - txt: simple text format, without markups and Systems (LMS), the first step was to create an environment viewable/editable with basic editors; for collecting and (semi) automatic exploitation of language - other: formats other than the ones nominated above; resources and tools For the 9 languages involved 6 (Bulgarian, Czech, Dutch, English, German, Maltese, - sxml: XML format with basic formatting information Polish, Portuguese and Romanian), a multilingual corpus, extracted from the txt and-or html formats; partially parallel, of almost 9 million words was collected, - morpho: annotated format with added morphological annotated and uploaded on the project’s portal5 There are information; about 30 linguistic tools on the portal, corresponding to - tok: annotated format with added tokenisation; various processing steps, hence to edges of the hierarchy of annotation schemas The following sub-sections will briefly - pos: annotated format with added part of speech describe the LT4eL formats and tools which will be information; involved in the adaptation of ALPE for LT4eL - lemma: annotated format with marked lemmas for 3 1 The resources in the hierarchy of words; schemas - NP: annotated format with marked Noun Phrases; The linguistic resources - called learning objects (LO) - - wp2xml: XML format with morphological and were first collected as documents corresponding to formats syntactical information merged from the above formats; on the first layer of the ALPE LT4eL hierarchy (see Fig 2), - akw: XML with automatically generated keywords according to their language, format (doc, pdf, plain text, annotation; html or other), domain (broadly: the use of computers in education), and IPRs In figure 2 an ALPE hierarchy, as - adef: XML with automatically generated definitions described in this paper, includes the sxml node and all annotation; others below it After their automatic conversion to XML, - axml: XML format combining the automatically using a specially created converter , the objects were generated keywords and definitions annotations linguistically annotated to mark (tokens, part-of-speech, lemma), hence placed on the second layer of the hierarchy 3 2 The tools of LT4eL – Figure 2 After another conversion to the specific format The LT4eL corpora required extensive processing The used as input for the keyword extractor developed within tools used for working with the LT4eL corpora are either the project, the resources are taken to the third layer, existing tools, which were adapted for LT4eL purposes, or corresponding to the annotation of keywords and definitory tools developed as part of the project contexts This specific hierarchy was used in LT4eL as a 6 http://ufal mff cuni cz/~spousta/lt4el/html2xml/LT4ELBase dtd 5 http://consilr info uaic ro/uploads lt4el/ doc pdflatexother Layer 1 htmltxt sxml 2 morpho tokposlemma NPLayer wp2xml akwadef Layer 3 axml Figure 2 The schema hierarchy for LT4eL Initially, all collected documents were in various formats corpora using this common XML standard for input files and they were converted to an XML format preserving The keywords and definitions were considered with respect some of the visual formatting (for further processing) This to the LT4eL domain: teaching computer science and e- step involved the development of a conversion tool from an learning All processing modules are under a continuous intermediate html or txt version of the original document process of improvement One of the final goals of the (obtained using existing tools) to an XML format observing project is to fully develop these technologies with the LT4eL format Basically, this tool, combined with the modularity and language-independence as two of the main original (automated) conversion from the original format to characteristics, hence making LT4eL an ideal environment the intermediate format, allows the user to input almost any for practical testing a system such as ALPE kind of file in the processing hierarchy and obtain a required XML format The conversion tool is configurable 3 3 ALPE-LT4eL for various output formats, its source code is available, so The LT4eL hierarchy represents the first significant this will become one of the core tools in the ALPE system deployment of an ALPE environment The nature and requirements of the project impose the following For the next step, adding basic linguistic annotation to characteristics of the system: the XML corpora, existing tools were employed Each partner language identified, adapted and used its own tools, - Has to be able to handle resources/tools for 8 and produced various types of annotated XML All these languages (the initial version includes tools for 3 formats were transformed according to a common DTD, to languages, the others having to be added later) include linguistic information such as token markings, - Has to work with files in 12 different formats ALPE lemma information, POS tags, morpho-syntactical automatically identifies the file format and language characteristics and noun phrase identifiers Two tools were - Has to be able to handle a wide variety of processing implemented to mark the keywords and definitions in the modules Preliminary inquiries showed that most modules are either Java or Perl based Some modules versatility for the user with respect to the file format of the are available only as executables input and easiness to expand the functionalities of the - Has to be able to handle diverse processing system for other languages configurations in different languages, from strictly Other envisaged deployments of ALPE hierarchies will serial to a combination between serial and parallel In be used in Question Answering, Textual Entailment and average, a processing chain involves 4 processing Anaphora Resolution systems Eventually, ALPE will be modules (and several of the ALPE core modules, like deployed as a global NLP hierarchy dynamically built and language identification and annotations merging) usable on the net as a webservice The ALPE-LT4eL hierarchy can be constructed manually, first by defining the schema definitions graph, then adding 5 Acknowledgments LT4eL modules to the edges The fact that the formats and Part of the work described in this paper was funded through modules involved are already available and are not subject the LT4eL project, STREP 027391 in FP6-2004-IST-4 to significant changes allowed the hierarchy to be fully build prior to its actual use The ALPE-LT4eL hierarchy 6 References ] D Cristea and C Butnariu Hierarchical XML representation allows a user to input a document in any LT4eL format and [1 automatically obtain any of the other formats in the for heavily annotated corpora Proceedings of the LREC on XML-Based Richly Annotated Corpora, hierarchy, except those above the txt and html level 2004 WorkshopLisbon, Portugal 2004 The integration of LT4eL tools and formats in an [2 ] D Cristea, C Forăscu, I Pistol Requirements-Driven ALPE hierarchy makes processing and adding resources to Automatic Configuration of Natural Language Applications the LT4eL corpus a much simpler and quicker task All the B Sharp (Ed ): Proceedings of the 3rd International transitions are performed automatically, as well as the Workshop on Natural Language Understanding and detection of the input file, required resources and modules Cognitive Science - NLUCS 2006, in conjunction with ICEIS The manual configuration works faster only for simple 2006, Cyprus INSTICC Press, Portugal 2006 ] H Cunningham, D Maynard, K Bontcheva, V Tablan processing chains When the number of integrated modules [3 increases, the advantage of the automated system becomes GATE: A framework and graphical development visible Using the ALPE type of processing is more environment for robust NLP tools and applications In justified in complex projects involving large numbers of Proceedings of the 40th Anniversary Meeting of the ACL data formats and processing modules Also, this type of (ACL’02), US 2002 ] H Cunningham, V Tablan, K Bontcheva, M Dimitrov processing can give access for non-specialist users easy to [4 resources and tools, being able to test any LT4eL module Language engineering tools for collaborative corpus using any file available in the many possible input formats annotation Proceedings of Corpus Linguistics 2003, Moreover, ALPE-LT4eL automatically identifies the Lancaster, UK 2003 ] P Monachesi, L Lemnitzer, K Simov Language language and checks whether a XML file (input for one of [5 the modules) conforms to the required DTD assuring the Technology for eLearning Proceedings of EC-TEL 2006, in correct execution of the processing flows Innovative Approaches for Learning and Knowledge The conceptual design of the ALPE hierarchy makes Sharing, LNCS 0302-9743, pp 667-672 2006 ] I Pistol, D Trandabăţ, A Iftene, D Cristea, C Forăscu possible, in a later stage, to include new nodes (formats) [6 and modules, as well as modules and resources for Processing Romanian linguistic resources in the LT4eL processing files in other languages project (in Romanian) Proceedings of the Workshop urces and Tools for Processing Romanian scu, D Tufiş, D Cristea (eds ) University 4 Conclusions Linguistic ResoLanguage, C Foră In this paper we have argued for augmenting the theoretical Al I Cuza Publishing House 2006 model of an automatic configuration of NLP architectures, [7 ] D Ferrucci and A Lally UIMA: an architectural approach to introduced in and , with new features that can unstructured information processing in the corporate research accommodate multilinguality, distributivity, versioning of environment Natural Language Engineering 10, No 3-4, language resources, manual versus automatic annotation, 327-348 2004 IPR and cost issues, as well as the diversity of annotation [8 ] N Ide, L Romary, E Clergerie International standard for a styles For the first time, the ALPE environment has found linguistic annotation framework In Proceedings of the HLT- an application field in an European project dedicated to NAACL'03 Workshop on the Software Engineering and applying linguistic processing to e-learning Although only Architecture of Language Technology, Canada 2003 a part of the functionality of the ALPE framework has been exploited in this context, since the hierarchy itself was considered given and therefore not dynamically built, the integration of ALPE in the LT4eL LMS can bring 